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Summary. This paper presents a novel methodology to test the security of the Diffie-Hellman 
public key exchange protocol. The security of many cryptographic schemes rely on the hard- 
ness of this problem. We are presenting a purely statistical test to compare this problem in 
different groups. We are using groups included in Zp with p prime as a major example, how- 
ever the methods presented are not restricted to these groups. The presentation of the results 
is primarily intended to introduce novel applications of statistical methodologies to the area 
of mathematical cryptography As such we will emphasize the cryptographical aspects of the 
work more than the statistical notions. 
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1. Introduction. 

Informally, through a key exchange protocol, two parties A and B agree on a common key 
Ka,B pooled from a set S while communicating over an insecure channel. Once the key 
is established, any further information shared between the parties is encoded, transmitted 
and decoded using the key if^^g. The protocol is secure if any third party C with access to 
the initial communication between A and B cannot tell apart K^ q from any other value 
in the set S. This guarantees that it is computationally unfeasible for an outside adversary 
to gain "any" partial information on 

The Diffie-Hellman key exchange protocol DifHe and Hellman ( 1976^ is a primary ex- 



ample of a public key exchange protocol. In its most basic form, the protocol chooses a 
finite cyclic group (G, •) of order N, with generator g, where • denotes the group opera- 
tion. In what follows we chose the multiplicative operation to denote the operation in the 
group, and thus the group G is generated by the powers of g (i.e., G = {g^, g^, . . . , 
symbolically G =< g >. Note that G, g and N are public information. 

The participants in the information transfer A and B each randomly chooses an integer 
a S [1, A^] and b G [1, -/V] independently. Then A computes g"", B computes g'' and exchange 
these elements of G over an insecure channel. Since each of A and B knows their respective 
a and b they can compute g""^, which or a publicly known derivation iiT^^s of that becomes 
the public key. 



fThe authors wish to thank Dr. Marco Lenci who suggested us the use of the entropy function 
as a quantifier for random information. 
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Any method of converting g"'' to if^.g is publicly known, and the security of the key 
Ka,b is directly dependent on the security of g"'', therefore for the sake of simplicity we 
will consider g"'' as the established key of the exchange for the rest of this paper. 

In the present article we will be concerned with the security of this protocol. We will 
interpret security in a probabilistic manner and will devise a statistical test that will "assess" 
the security of the exchange in a given group. 

In the cryptology literature there are two concepts of security - the core security and the 
concept of semantic security which leads to various security models. The semantic securit; 
and t he related concepts come under the name of "provable security" ((Koblitz and Meneze; 



a 



2004 Section 2). The core security of the Difhe-Hellman key exchange protocol depends 
on the discrete logarithm problem, the computational Difhe-Hellman problem and the de- 
cision Difhe-Hellman problem. In this article we are concerned with the core security of the 
exchange. We give a brief introduction to the discrete logarithm pro blem and the computa- 
tional Difhe-Hellma n problem, for more on these a reader can look at ( Koblitz and Menezed . 



[2004, Section 5) or (jStinsonl . |2005| . Chapter 6) 



Assumption 1 (DL). For a cyclic group G, generated by g, we are given g and g" 
n G N, the challenge is to compute n. 



ah 



Assumption 2 (CDH). Given g,g°',g'' it is hard to compute g 

Clearly, if these assumptions are not satisfied then C, an adversarjU, can gain access to 
the key g°'^ . The relationship between these two assumptions has been extensively studied. 
It is clear that the CDH assumption will not be satis fi ed in a group where find i ng th e 
discrete logarithm solution is easy. In lMaurer and Wolj (Il999l ). iBoneh and LiptonI (|l99d) . 



the authors show that in several settings the validity of the CDH assumption and the 
hardness of the discrete logarithm problem are in fact equivalent. 

Unfortunately, the DL and the CDH assumptions are not enough to ensure security of the 
Diffie-Hellman key exchange protocol. Even if these assumptions are true, the eavesdropper 
C may still be able to gain useful information about g"'^ . For example, if C can predict 90% 
of the bits in g°'^ with high probability then for all intents and purposes the key exchange 
protocol is broken. Moreover, there exist protocols where the knowledge of even one bit will 
break its security (some Casino electronic games) . With the current state of knowledge we 
cannot bejx)nfidcnt that assuming only CDH, a scenario like the one described above does 
not exist (jBoneh (,1998. )). 



1.1. Our main contribution. 

Lemma [2.11 states that the security of the Diffie Hellman exchange protocol is best studied 
from a statistical perspective. We introduce a statistical treatment of this particularly 
important problem in cryptography and it is our hope that many more problems will be 
approached in a similar fashion. 

We present novel methodologies to help asses the security of the Diffie-Hellman key 
exchange protocol in a given group (G, ■). In Section [2] we present the statistical criteria we 
use as well as the relevance and connection with the security assessment. Sections |3] and |4] 
present statistical tests to check the validity of the statistical criteria presented in Section 

X There are various concepts of adversary in cryptographic literature, the power and authority 
they have. In this article we assume that our adversary is a passive eavesdropper. 
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[H In particular, Subsection 14.11 detail the use of the permutation testing methodology 
to calculate concrete values for the probability of Type I error of the tests. This section 
contains the important idea that the method can be used to compare the security of the DH 
key exchange protocol in two or more different groups. Furthermore, the groups which we 
use to perform the comparison do not need to have the same operational structure. Thus, 
in principle, it is possible to compare the security of the exchange in finite groups generated 
using elliptical curves versus the same order prime subgroups of Z„, n G N. We do not 
pursue this direction in the presented work. 

Section [5] applies the methodology we develop to some examples where the security of 
the DH-exchange has been conjectured. It is found that the results obtained strengthen the 
conjectured hypotheses. Finally, in Section [5] we present general conclusions and directions 
of future research. 

The treatment of the problem is based on the empirical distribution of the key We 
mention that a better approach from the cryptographic perspective would be to look at the 
distribution of a collection of bits in the binary expansion of We believe our methods 
could be extended and applied to this representation as well. 

2. Statistical criteria to asses the security of the Diffie l-iellman l<ey exchange pro- 
tocol. 

In its most basic form described above the security of the Diffie-Hellman key exchange 
protocol relies on an approximate identification of the key g"'' from the public information 
g, g°' T . In statistical terms there exist a clear concept that answers the question of 
identification: statistical independence. Therefore a sufficient condition for the security of 
the DH key exchange is: 

Assumption 3 (DH-Independence). Given a cyclic group G of order N, generated 
by g, let a and h he chosen independently, uniformly at random from the set {1, 2, . . . , N}. 
Then the random variables {g°',g^) and are independent. 

For a given set S we will use the notation DU{S) to denote the discrete uniform distri- 
bution on the elements of S. With this notation a and b are independent random variables 
with the DU{{1, 2, . . . , N]) distribution. 

Clearly this is a sufficient condition for the security of the Difhe Hellman key exchange 
protocol. There is no information to be gained about g""^ from seeing (5°, g^). Unfortunately, 
as one's intuition may indicate, this assumption is rejected for any finite group G we have 
looked at. In the next section we construct a statistical test for this assumption which will 
help introduce the notations and the further testing procedures. 

If the assumption presented above is not true, hope is not lost. The DH-Independence 
assumption is a sufficient condition. In fact, in the cryptographic literature this assumption 
is not even mentioned, however a weaker necessary condition is presented: 

Assumption 4 (DDH). Given g,g'^,g'' and an element z & G it is hard to decide 
whether or not z = g"''. 



In this form the DDH assumption constitutes a nec essary condition for the security of 
the Diffie-Hellman key exchange protocol. Furthermore. I Joux and Ng uveni (|2003l ) construct 
groups based on elliptic curves where the DDH assumption is not satisfied while the CDH 
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and the discrete logarithm problem are proven to be equivalent and hard. This fact prompts 
the necessity to directly check the validity of the DDH assumption for a given group. 

The DDH assumption is assumed, either implicitly or explicitly in many cryptographic 
systems and proto cols. Applic a tions include: the many implementations of the DH key ex - 
change itself (e.g., Diffie et al. ( 1992)), the El-Gamal e ncryption scheme lEl-Gamal (1984), 
the undeniable signatures al gorithm Chaum and van A ntwerpcn (198!?), Feldsman's verifi- 
ab le secret sharing pr o tocol Feldman ( 19871 ). PedersenI (199l|), and many others; we point 
to Naor and Reingoldl ( 1997 ) for a more detailed list. 

Notice that the DDH assumption in the form presented above is a little vague because 
of the use of the predicate, "hard to decide". Surprisingly, attempts to make the DDH 
assumption explicit were not mad e until late after its formulation in DifBe and HellmanI 
( 19761 ) ■ The first ventures (J3 onch and Lipto 3, [1991) use standard cryptographic machinery 
(|Yaor (|l982l ) ; iGoldwasserand Micalil (|l984[) '). to express the assumption in terms of compu- 
tational indistinqui shabil ity. Put in this tra ditional cryptographic for m it was discovered 
quickly bv .Stadlerl ( 19961 ) and independentlv lNaor and Reingold (|1997^ that if one assumes 
the existence of a polynomial time probabilistic algorithm which distinguishes the real key 
g"''' from the other possible values even with a very small probabilitjf^ (for all the possible 
inputs), then another polynomial time algorithm can be constructed from the first which 
will output g"''' with a very large (almost one) probabil ity. The only requirement is that 



the size of the group is known, requirement lessened bv IBoneh ( 1998 ) which only requires 
finiteness of the group. 

All this evidence points toward a more specific definition based entirely on the notion 
of sta t istical significance. Indeed, this fact mate ri alized in a series of papers ICanetti et al 



19991 12OOOI) : iFriedlander and Shparli'^^ (|2Q0ll ): IVasco et al.l (|2004l) . which call this new 



for m of the assumption th e Diffie Hellman Indisting uishability assumption (DHI). We note 
that lGennaro et al.l (|2004[) : IJoux and Nguvenl (120031) us e the same form except it continues 
to call it DDH. We point the reader to iHastad et al. (I1999I ) for a detailed discussion on 
the concept of statistical significance versus computational significance; in the context of 
pseudo-random number generation. 

For our purposes of studying the security of the Diffie Hellman exchange we will use the 
following assumption: 

Assumption 5 (DHI). Given g,g°^Tg^ the distribution of g'^^ is indistinguishable from 
the Discrete Uniform distribution on the elements of G (DU{G)). 

The notion of indistinguishability used here is the usual statistical one. Two variables are 
indistinguishable if they have essentially the same distribution, or put formally, Xi and X2 
are indistinguishable if their distribution functions Fi{x) = P{Xi < x) with i = 1,2 have 
the property: 

Fiix) = F2{x), for all X G K \ {Ai U A2), 

§but not negligible. For the sake of completeness we give here the whole definition. It is presented 
in the footnote since it is not relevant to our approach at all. Suppose that the group G where 
the exchange takes place has order A'^ and n — logj A^. It is said that a probabilistic algorithm 
Si/ decides on the right key with small (non-negligible) probability if there exist a polynomial 
expression p(-) such that for any r £ G: 



Prob(.i2/ outputs (?" ) — Prob(.e/ outputs r) > 



pin)' 
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where Ai, A2 are the sets which contain the discontinuity points of Fi(-), respectively 
F2(-). Apphed to our specific case the distributions are discrete, therefore the distribution 
functions Fi and F2 are just step functions with jumps in a compact set in R, thus using the 
right continuity of the distribution functions, the usual definition translates here in equality 
everywhere. We conclude that in our context, indistinguishability means that the variables 
have the same distribution. 

This formulation is perfectly natural for a statistician who tries to express the DDH for- 
mulation presented above. We note that our version of the DHI assumption re quires that the 



conditional distribution {q \ q°', q , q) is uniform while the previous articles ICanetti et al 



jl999, 2000) ; Fri edL , ., , , , , , 

IJoux and NguvenT i 2003f l require that the distribution of the triple (5", g'', g°'''\ g) be Discrete 



ander and Shparlinskii (200lf l: IVasco et al.l (|2004h : lGennaro et all (|2004l ): 



Uniform on the elements oi G x G x G = {DU{G^)). Given an outcome (a;, y, z) we can 
write using the simple multiplicative rule: 

P {g^ = X, = y, .g"" = z\ g) = P (,g"^ - z\ g^ = x,g' = y, g) P {g^ ^x,g'= y\ g) (1) 

Under the original condition that a and h are DU{{1, . . . , N}) and using the fact that g is 
a generator for G then the distribution of {g°'Tg^\g) is DU{G^), thus the two formulations 
are perfectly equivalent. 

It is known that in general statistical indistinguis hability implies computational indis- 



tinguishability^ but the reverse is not in general true, (jGoldreichl . l200ll Section 3.2.2). The 
following lemma states the same result in our specific case using the assumptions presented 
in this section: DHI and DDH. 

Lemma 2.1. In a group G of order N, if the DHI assumption is true then the DDH 
assumption is true as well. 

Proof. Assume that DHI is true in G, then for given g'^, g^, the probability 
P (^g"^'' — z\g"-,g''^ — 1/N for any z £ G. This is the hardest possible scenario in the DDH 
assumption and hence we claim that DDH is satisfied. 

This lemma says that in any group G, DHI is a stronge condition than that of the DDH 
assumption. If we look at the proof closely then we will see that the difference between the 
DDH and the DHI consists in the fact that a measure of hardness has been provided in the 
DDH assumption via the uniform distribution. 



3. Testing for DH-lndependence. 

We give general definitions, then we go to our specific case. 

Let X, Y, and Z be three discrete random variables taking values in the sets {xi,X2, . . . , a;„}, 
{yi, 2/2, • ■ • , yrn}, {zi,Z2, ■ . ■ , zi} respectively. Denote with: 

p{xi,yi,Zi) = P{X ^Xi,Y ^ yi,Z = Zi}, yij,k 

the joint probability function corresponding to {X, Y, Z). With usual notations we denote 
p{yj\xi), p{xi,yj\zk), etc. the conditional probability functions of X\Y, {X,Y)\Z, etc. 
Furthermore, assume that for all fc G {1, 2, . . . , Z} the marginal distribution p{zk) ~ P{Z = 
Zfe} 7^ to avoid complications conditioning on a set of measure zero. 

^or at least as strong 
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Definition 3.1 (Entropy). We define the joint and conditional measures of un- 
certainty. 

n m I 

H{X,Y) = ~^^^p{x^,yj,Zk)\ogp{xi,yj) (2) 

1=1 j=l k=l 
n m I 

H{X,Y\Z) = -^^^p{x„yj,Zk)\ogp{xi,yj\zk), (3) 
i=i j=i k=i 

with the convention 0(— oo) = 0. 

In the above definition we choose to work with the natural logarithm, however any other 
basis will be equivale nt for our purp ose due to the constant in the usual definition of the 
entropy function (see Shannon ( 19481 )). 

Lemma 3.2. The following property holds for the above uncertainty measures: H{X, Y\Z) < 
H{X, Y) with equality if and only if (X, Y) and Z are independent. 



The proof i s an easy exercise in probability, the reader is directed to IShannonI (|1948I ) or 



RokhlinI ([1967|) for more details. 



Lemma 13.21 gives a clear criterion for our first test. More specifically: assume that the 
number of elements in G is N , i.e. \G\ = N. As an example |Z*| = p — 1. 

The plan is to apply the above lemma with X — g"^, Y — g^ and Z = g"^^ . Since both 
participants in the Diffie-Hellman protocol choose a and h at random and g is the generator 
of G we can assume that g"" and g^ are independent and their distribution is DU{G). Thus, 
the distribution of (5", g'') is DU{G x G). This in turn implies that p{xi, yj) = 1/iV^ for all 
i, j G {1, . . . , N}, and thus the first entropy measure ([2|) becomes: 

^(S^^S") =-E^^(^-2^J'^fc)l°g]^ = 21ogA^ (4) 

At this point we can devise a test of the hypotheses: 

g"'' is independent of {g°',g'') 
g"'' is NOT independent of {g'',g'') 

using Lemma 13.21 The test in ([5]) is equivalent with: 

H{g'^,g''\g-') ^2logN 
H{g'^,g''\g-') < 2 log TV 



(5) 



(6) 



The question is: how do we proceed with this test? Since all the distributions are finite, 
in theory at least, we could calculate ^(xi, j/j l^fe) for all the possible triples in GxGxG — G^. 
If we had these quantities it would be a simple matter to calculate H{g°',g^\g'^'') according 
to ®. 

Denote this value based on the whole set G'^ by T/v. The test will then compare this 
value with 21ogiV. If equal then the variables are independent and the DH- Independence 
assumption is satisfied. If smaller then we could not prove independence of the variables. 

At this point let us make two important remarks. 
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Remark 3.3. In the definition of the entropy functions (0) and (0) we did not use the 
structure of the group G in any way, only the relative frequency of the elements in the 
group. This fact make the methods based on the entropy function well suited for comparison 
between diverse groups. We will take advantage of this feature later in this paper. 

Remark 3.4. In practice if we wish to calculate Tjv we have to calculate all the possible 
values for {g"',g^) and this will take longer than an exhaustive search. Thus calculating Tjv 
is not practical, instead we would have to estimate it. We will detail the estimation in the 
next section. 

Alas, as we suspected from the beginning, implementing this first test tells us that 
^-iid 5"'' are not independent in every group that we tried. For example in Z* with 
multiplication, calculating Tjv the entropy in ([3]) for p G {1193, 2131, 11093} will yield values 
which are far apart from 2 log(p— 1). In fact when looking at the values obtained we see that 
they are close to log(p — 1) thus the value of our first test increases with p. The closeness 
of the test to \og{p — 1) is an interesting experimental fact. This fact is investigated and 
explained by our second test presented in the next section. 



4. Testing the DHI assumption 

If the DH-Independence assumption is satisfied in a given group G, then we could stop and 
decide that we found a perfect group for the Diffie Hellman key exchange. However, the 
experimental procedures and our intuition point out that the DH-Independence assumption 
is never satisfied in any finite group G. The next task is to obtain a statistical testing 
procedure to verify the validity of the DHI assumption in a given grou p G. The idea is to use 



the en tropy function ^ in the sense of KuUback-Leibler divergence iKullback and Leiblei 
(|195 1') as a measure of departure from the entropy calculated under the hypothesis of 
Uniform distribution. Specifically, using earlier notation, we wish to construct a statistical 
test that will check the validity of the following hypotheses: 

f iJo : The distribution of 5"^] (5"^, g'') is DU{G) 
[Ha : The distribution of g'^^] {g", g^) is NOT DU{G) 

Let us denote the elements of G as {51, (?2: ■ ■ • jSat}. Suppose we can look at all the 
possible triples {g"" , g'' , g°'^) when a,6 G {1,2, . . . ,N} take all the possible values. Clearly, 
there are iV^ such possible triples and assuming that a and b are chosen at random, each 
such triple will have probability l/N"^. The last element in the triple g°'^ will get mapped 
into N possible values (the elements of G). Thus, some values in G will be repeated. For 
an element gk € G denote the number of times gk appears in the place of g"''' among 
all the N'^ triples. We have then X^fc "^fe ~ ^'^^ P^^^ {9"'t9^) that corresponds to 
9°"^ — 9k we can then calculate the conditional probability as: 

p{.9°' = 9i,9^ = ffjlff"'' = 9k) = —lA{gi,g3,gk), 

nik 

where A is the set of all possible N'^ tuples {g"" , g'' , g'^'') , and we have used the notation 
1^(3;) to denote the indicator function of the set A C ^, i.e., 1a : ^ {0, 1} is given by: 



1 if X e A 
if X ^ A 
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We can continue: 

N N N 

1=1 j=l k=l 
N N ^ 1 

fc=l i,j=l 
k=l 

Under the null hypothesis Hq, the distribution of <?", <?'') is uniform, therefore we 
should have the multiplicities equal. This automatically implies that rrik ~ N for all 
fc's and then the entropy fimction in ([5]) is: 

^ 1 1 

k=\ 

The testing statistics is: 

N 

T^^H (g^ g"^) - logiV = ^ ^ logm,. - logiV. (9) 

k=l 

This test is based on the whole set of values in . Accordingly, if the value of the test 
equals zero then the null hypothesis Hq is true, any other value of the test will support the 
alternative hypothesis. We summarize this result in the following: 

Lemma 4.1 (Testing Procedure). With the previous notations if Tn = then the 
DHI assumption is satisfied in a given group G. 

Both remarks 13 . 3 1 and 13 . 41 cert ainlv apply for this testing procedure as well. In particular, 
remark [3.41 means that we have to find procedures to estimate T/v instead of calculating it. 
This will introduce distributions and we detail the approach next. 



4. 1. The permutation test approach. 

Assume that we can obtain a sample of n pairs {(aj, &i)}ie{i,2,....n} from {1,2, . . . , N} x 
{1,2, . . . ,N}. For each pair in the sample we can calculate the triple (.9"*, Let 
An be the set of all the triplets in the sample. 

Using ([8]) we can calculate an estimate of H {g"" , g''\g'^'') using: 

ki "fe 

Pn{gz,gj,gk) = -^iA„igt,gj,gk), (10) 

ki -fe 

Pn{gt,gj\gk) = -^iA„{g2,gj,gk), 

where once again denotes the multiplicity of g^, but in the given sample of n obser- 
vations. We took into account the possibilities of obtaining repeated observations in the 
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sample by multiplying with the factor kijk] which represents the number of times we see 
the same observation {gi,gj,gk) in our sample. 
The test statistic is: 

n n n 

Tn = -^^^Pn{gi,gj,gk)logpn{g^,gJ\gk) -logn (11) 

2=1 j = l k = l 

All that is left, is to investigate the distribution of T!„ under the null hypothesis Hq. 
Under the null hypothesis the 's are the multiplicities of gk 's in a sample of size n drawn 
from the set 

{51, ■■• ,51:52, ■•• ,52, ••■ ,5Af: ■ • • ,ffAf} 
where each element in the group G are repeated N times. 

Let us denote Mi, M2, ■ ■ ■ , Mn the multiplicities of the elements {gi, ^2, • ■ • , ffw} in 
a sample of size n. It is not hard to show that the joint probability distribution of 
(Ml, . . . , Mn) is the so called multivariate hypergeometric distribution: 

P (Afi = mi, . . . , Mn = ttin) = 
The test statistic under Hq is: 

f„ = V — log Mk - log n. (12) 
— ' n 

k=l 

If we would be able to calculate the distribution of r„ knowing that {Mi, M2, ■ ■ ■ , Mn) 
are multivariate hypergeometric then we would be in position to reach the conclusion of 
the test of uniformity ([7]) by calculating the p-value of the test statistic (fTTj) using this 
distribution. 

Finding the distribution of the test statistic under Hq is however not an easy task. 
This is the reason we propose the use of permutation testing for which knowledge of this 
distribution is not necessary. 

The permutation testing procedure generates samples (Mi, M2, . . . , Mn) from the Mul- 
tivariate hypergeometric distribution. For each sample, it calculates the corresponding value 
of the test statistic under the null hypothesis as in (fT2| . These values are obtained from 
the assuminption that Hq is true; this allow us to calculate the empirical distribution of 
our sample statistic r„ under the null hypothesis. The p-value of our test is given by the 
proportion of values as extreme or more than the one calculated in (jlip using the group G. 

A small p-value is an evidence against the null hypothesis in (O, that the sample comes 
from a uniform distribution. We summarize the procedure bellow: 

Testing procedure to determine validity of DHI for a group G 

(i) We take a sample of size n and we calculate the test statistic as in (|TT|) . 

(ii) We generate many test statistic values under the hypothesis Hq is true using ([T2|) . 
then construct their empirical distribution. 

(iii) We calculate the p-value of the test as the proportion of values in the empirical 
distribution found in (ii) lower than the test value found using G in (i). 

(iv) If the p-value is small we reject the DHI assumption. If the p-value is big we did not 
find evidence that the DHI is not satisfied in the given group G. 



( ) )■•■( ) 
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4.2. How to compare two or more groups? 

We will note at this point that the absolute value of the test \Tn\ and its estimate \Tn\ 
represent a measure of departure from the Discrete Uniform distribution. The bigger the 
estimate the further is the distance from the uniform distribution and the weaker is the 
validity of the DHI assumption. Remark 13.31 also tells us that the nature of the group 
operation is irrelevant for the testing procedure. Therefore, we can use the test as a tool to 
compare the strength of the Diffie-Hellman key exchange protocol in two or more groups. 
To be able to do so we need the order of the groups compared to be similar and, more 
importantly, the sample size on the basis of which we calculate the permutation test to be 
the same. We take advantage of the ability to compare different groups in the next section. 



5. Testing the DHI assumption in Z* 

We are going to check the efficiency of the testing procedure for the most useful finite 
groups, those included in Z* with the multiplicative operation. We present the following 
examples as a way for checking the validity of the testing procedure. 

Example 5.1 (A group where the DDH assumption does not hold.). Consider 
G ~ Ij* with p prime. It is known that co mputi ng Legendre symbol in this group gives a 
distinguisher against DDH I 'Gennaro et al. 1 200A ) ). 



Example 5.2 (A group where the DDH assumption is conjectured to hold). 
We currently do not know any DDH distinguisher for a prime order subgroup of Z* . There- 
fore, given p and q prime with g divisor of p— 1 it is conjectured that in a subgroup of order 
q of Z* the DDH assumption holds. 

We start with a given group G and using the test presented in the previous section we 
will test for the validity of the DHI assumption in that group G. This should provide a 
strong indication towards the security of the Diffie-Hellman key exchange protocol in that 
group. 



5. 1 . The rate of convergence of the testing procedure 

The first thing we investigate is the rate of convergence for our test. To do this we need to 
calculate the true value of T/v and thus we have to look at small groups. 

For space consideration we are presenting only results obtained for p ~ 1193 in Table [1] 
in the Appendix. The sample sizes are presented in the first column of the table and the 
corresponding sample entropy value r„ in column two. Column 3 presents the proportion 
of values lower than T„ - an entry equal to 1 corresponds to a p- value of the test. The 
fourth value in the table represents the distance from T„ to the center of the distribution 
of entropy values calculated under Hq. Finally, the last value represents the ratio of the 
distance in column four, to the distance from the sample entropy r„ to the furthest away 
point in the distribution. It is an indication on how many standard deviations away r„ is 
from the distribution. 

There are two remarkable features of these values - one, we see that the test rejects the 
null hypothesis that the distribution of (g"''! ff'') is uniform on the elements of G; the 
other remarkable feature is that we determine this fact based on a sample of 354 values 
or about one third of the value of = 1192. In the second place if we wish to determine 
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Fig. 1. Comparison of the test values for different sample sizes and Z*'s 



the actual entropy distance from the two distributions - a feature that wiU be useful when 
comparing two or more groups; we can see that starting with a sample size of n = 3304 (or 
about 3 times N) we start to obtain accurate results. 

To illustrate better the rate of convergence for some other groups we plot in Figure [T] 
the evolution of the test values with the size of the sample. This figure suggest that to get 
a good estimate for Tjsi the sample size will depend on the size of the group, for example 
we need a larger sample size for Z^]^gQ3 than we need for Z^ig^. In addition, the same figure 
points out another interesting fact. 

Following example 15.11 we know that Z* is not secure. It is also conjectured that some 
groups are more secure than others. Looking at the problem from that perspective, for 
which groups are more easily broken using the Legendre symbol, it is also assumed that by 
increasing the size of the group one can make the group more secure. 

We can see from the figure that the second assertion is not true. Just increasing the size 
of the group does not make it more secure. Remembering that a smaller relative distance 
corresponds to closeness to the Discrete uniform distribution on the elements of G, we 
see from the Figure [T] that while ZJ^goa the largest group is the most secure of the three, 
the situation between the other two groups is not what we would have expected looking 
at the size of the group alone. Even though is the larger group (almost twice the 
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size), it is also less secure from the DHI assumption perspective than I'lig^- This indicate 
that the choice of the group G rather than the size of it is essential for the security of the 
Diffie-Hellman key exchange protocol. 



5.2. Comparison of the DHI assumption across groups. 

Next we wished to give an indication of groups that are more secure than others. It is 
known that considering only the Legendre symbol criterion the safest groups among Z* are 
the o nes obtained when p is a safe prime i.e., of the form p — 2q + 1 where q is another 
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Fig. 2. Histogram of all the test values for Z* with 2000 <p < 4000. Values closer to zero represent 
safer groups for DH exchange. 

We wished to test this theory for a large set of Z* groups with varying p's. We looked 
at all primes between 2000 and 4000, and again for primes between 9000 and 11000. The 
reason for the two separate segments of primes is that we expect some sort of consistency 
between them. We show the distribution of the test values for these groups separated into 
safe and not safe primes in Figures [2] and [3l 

First, we notice that the behavior of primes in the range 2000 to 4000 is very similar 
with the primes for the higher range 9000 to 11000. Second, in both ranges we see the same 
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Fig. 3. Histograms of test values obtained for Z* with 9000 < p < 11000. Values closer to zero 
represent safer groups for DH exchange. 



conclusion applies, the safe prime groups are more secure than any other groups. However, 
the test estimate obtained for each of the safe prime groups is significantly different from 
zero therefore there is no safe group in the ranges given for which the DHI assumption is 
verified. This seem to confirm the assertion in the Example 15. II 

Next, we will look to Example 1 5. 2 1 We will use our test for the prime subgroups of each 
of the safe primes in the range 9000 to 11000. More specifically, we look at each Z* with p 
a safe prime, and we construct the prime subgroup of order q in each such group. Then we 
test the DHI assumption in each subgroup thus constructed. The values obtained for the 
distances are plotted in the upper histogram of Figure S) We mention that the behavior of 
the test values for primes between 2000 and 4000 was very similar, for space consideration 
we omit the corresponding plot. All the values are obtained using the same sample size 
n = 8 X 10^. The reason for this particular value is that while the groups themselves are in 
the range 9000 to 11000, the subgroups are of order 4500 to 5500. 

It is remarkable to see that these subgroups are clearly safer for the DH exchange than 
any other groups plotted in the picture. The results seem to confirm the conjecture in the 
Example 15.21 However, the actual test of uniformity was rejected, but we needed a very 
large sample size almost equal to the maximum value iV^. 
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Fig. 4. Comparing values of the test for different type of groups wlien 9000 < p < 11000. On top, 
we plot values for prime subgroups of Z* when p is a safe prime. Middle, we plot values for safe 
prime Z*'s. On bottom, values for all the other groups Z* in the range given. Values closer to zero 
represent better groups for DH exchange. 



For a better comparison we plotted in Figure O on page (TS] only the histogram of the 
values obtained for the prime subgroups of the Z* with p a safe prime (top) and the 
histogram of the values obtained for the Z* groups, p a safe prime between 9000 and 11000 
(bottom). 

It is remarkable the closeness of these values to each other considering that the order of 
the group varies between 9000 and 11000 a 20% variation in size. This is an encouraging 
fact, which suggests that for even larger p's we will see the same sort of consistency in the 
values. This will imply that groups with the same operational structure will have similar 
behavior from the point of view of the Diffie-Hellman security. However, there is a variation 
in the values as illustrated in the Figure |4] on page [14] which represent the histogram of the 
values obtained for the prime subgroup of Z* groups, with p a safe prime varying between 
9000 and 11000. 
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Fig. 5. A more detailed comparison of the previous image (Fig. [4). We are comparing the prime 
subgroups with the corresponding safe groups. Values closer to zero represent safer groups for DH 
exchange. 



6. Conclusion and future work. 

In this article we present a novel statistical testing procedure to help assess the security 
of the Difhe-Hellman key exchange protocol. The methods presented are quite general 
and to our knowledge represent the first systematic pure applied statistical approach to a 
cryptographic problem. The article is intended to open a way for methods coming from 
statistical world to the cryptographic domain. We do not claim to solve the security of 
the Diffie-Hellman exchange protocol. What we have presented are primarily sufficient 
conditions for the security. We also presented a way to compare the strength of these 
conditions in different groups. In Section [5] we show that among the groups we looked at, 
only the prime subgroups of a large group are close to fulfilling the conditions considered. 

An obvious lack in our results is a statistical analysis for very large primes. Typically the 
groups used in cryptography are of the order at least 2^"^^. The use of our testing procedure, 
ad-literam as presented in section |4] prevents us from such an analysis, however currently 
we are investigating directions of circumventing the permutation testing approach. One 
direction is to approximate the distribution of the test in ([T2|) with a multinomial distribu- 
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Fig. 6. A blowup of the histogram of the values for the prime subgroups in the safe primes. Note the 
values are close to zero but not equal to zero. 



tion, then use a multivariate normal distribution for a second approximation. This should 
allow us to calculate the p-value of the test directly without the need of the permutation 
testing. Another direction is to put together outcomes into coarser groups and look at the 
distribution o f these groups of out c omes . This idea is similar in result with the approach of 



gistribution o r tnese groups oi out c omes , inis laea is similar m result witn tne approacn or 
Canetti et al.l (|l999l ): [Banks et aL ( 20061 ). and should allow us to speed up the procedure in 



order to apply it to much larger groups. It will also allow us to look at the distribution of 
the binary representation of prime subgroups of a large group, and extend the methodology 
to finite groups defined using elliptical curves. 

7. Appendix 

We present the actual values obtained in Z„ when p — 1193 in Table [T] 
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Table 1: Results for Z^jgs 



Sample size n 


Sample entropy value 


p- value 


Distance to center 


Relative distance 


59 


0.046993 


0.556 








118 


0.105734 


0.904 








354 


0.280115 


1 


0.0867205869841293 


0.602176619941004 


885 


0.532425 


1 


0.088729342259792 


0.599662439145006 


1829 


0.96382 


1 


0.158395336513686 


0.758210206038124 


3304 


1.40654 


1 


0.187918729961140 


0.890429900342397 


5428 


1.82531 


1 


0.194266177572582 


0.935988768705612 


8319 


2.19741 


1 


0.181355529391107 


0.952411456549936 


12095 


2.55884 


1 


0.192004761885750 


0.966980890255525 


16874 


2.87286 


1 


0.187337211259202 


0.979981576295216 


22774 


3.1674 


1 


0.191522958630831 


0.981517705830948 


29913 


3.43077 


1 


0.188465586935031 


0.98875796589123 


38409 


3.67754 


1 


0.189706561218385 


0.989433516631953 


48380 


3.90781 


1 


0.192416008075197 


0.99165124060656 


59944 


4.11938 


1 


0.19204137533093 


0.99478104337756 


73219 


4.31302 


1 


0.187468331653386 


0.994755817256502 


88323 


4.50025 


1 


0.188526799093478 


0.996560990251708 


105374 


4.67745 


1 


0.19031690687346 


0.996854516374416 


124490 


4.84304 


1 


0.190069869416784 


0.997275184401866 


145789 


5.00357 


1 


0.193382446593161 


0.997349334367475 


169389 


5.14947 


1 


0.189808592541566 


0.997931831476642 


195408 


5.29352 


1 


0.191440055573543 


0.998590870156603 


223964 


5.42925 


1 


0.191148605525655 


0.998411134116565 


255175 


5.55893 


1 


0.190698438462096 


0.998845341861062 


289159 


5.68315 


1 


0.190158376706143 


0.998956167303273 


326034 


5.80232 


1 


0.189542921352024 


0.99921427003345 


365918 


5.91821 


1 


0.190224685898690 


0.999099568272852 


408929 


6.03153 


1 


0.192585342387550 


0.99931590521473 


455185 


6.13611 


1 


0.190147697214377 


0.999413667702933 


504804 


6.2378 


1 


0.188502041891033 


0.999534588253575 


557904 


6.34038 


1 


0.191174900210519 


0.999622308526143 


614603 


6.43583 


1 


0.189935678716927 


0.999625957882701 


675019 


6.53032 


1 


0.190747522920258 


0.99976657291958 


739270 


6.62041 


1 


0.189993001424682 


0.99970573975723 


807474 


6.70913 


1 


0.190533231900746 


0.999813379231167 


879749 


6.79349 


1 


0.189228458019329 


0.999814900993474 


956213 


6.87841 


1 


0.190858660137478 


0.99984125150255 


1036984 


6.95729 


1 


0.188695781636576 


0.999914186731075 


1122180 


7.03801 


1 


0.190502664400493 


0.999926167521206 


1211919 


7.11461 


1 


0.190209994699298 


0.99994757464427 


1306319 


7.18871 


1 


0.189337283972479 


0.999979948549426 
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